12 research outputs found

    The Robust Reading Competition Annotation and Evaluation Platform

    Full text link
    The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms. Concurrent with its second incarnation in 2011, a continuous effort started to develop an on-line framework to facilitate the hosting and management of competitions. This paper outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the competitions. The RRC Annotation and Evaluation Platform is a modular framework, fully accessible through on-line interfaces. It comprises a collection of tools and services for managing all processes involved with defining and evaluating a research task, from dataset definition to annotation management, evaluation specification and results analysis. Although the framework has been designed with robust reading research in mind, many of the provided tools are generic by design. All aspects of the RRC Annotation and Evaluation Framework are available for research use.Comment: 6 pages, accepted to DAS 201

    Sparse Radial Sampling LBP for Writer Identification

    Full text link
    In this paper we present the use of Sparse Radial Sampling Local Binary Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture classification. By adapting and extending the standard LBP operator to the particularities of text we get a generic text-as-texture classification scheme and apply it to writer identification. In experiments on CVL and ICDAR 2013 datasets, the proposed feature-set demonstrates State-Of-the-Art (SOA) performance. Among the SOA, the proposed method is the only one that is based on dense extraction of a single local feature descriptor. This makes it fast and applicable at the earliest stages in a DIA pipeline without the need for segmentation, binarization, or extraction of multiple features.Comment: Submitted to the 13th International Conference on Document Analysis and Recognition (ICDAR 2015

    Hybridní trénovací data pro OCR historických textů

    No full text
    Současné OCR systémy běžně používají rekurentní neuronové sítě, které jsou schopné zpracovat celé řádky textu. Takovéto systémy nepotřebují metody pro segmentaci znaků. Nevýhodou je potřeba velkého množství anotovaných dat. Toto může být vyřešeno pomocí generování syntetických dat. Naneštěstí takováto data nejsou příliš vhodná pro historické dokumenty, především z důvodu kvality. Tento článek prezentuje hybridní metodu pro generování trénovacích dat pro OCR systém. Nejprve jsou shromážděny obrázky jednotlivých znaků, které jsou posléze využity pro generování celých obrázků řádek. Další přínos článku je v návrhu OCR systému s využitím konvoluční neuronové sítě a sítě LSTM. Nejprve se systém natrénuje pomocí syntetických (hybridních) dat a poté je model vyladěn pomocí anotovaného korpusu. Dále je představena trénovací strategie pro dosažení nejlepších výsledků, které jsou porovnatelné se state of the art.Current optical character recognition (OCR) systems commonly make use of recurrent neural networks (RNN) that process whole text lines. Such systems avoid the task of character segmentation necessary for character-based approaches. A disadvantage of this approach is a need of a large amount of annotated data. This can be solved by using generated synthetic data instead of costly manually annotated ones. Unfortunately, such data is often not suitable for historical documents particularly for quality reasons. This work presents a hybrid approach for generating annotated data for OCR at a low cost. We first collect a small dataset of isolated characters from historical document images. Then, we generate historical looking text lines from the generated characters. Another contribution lies in the design and implementation of an OCR system based on a convolutional-LSTM network. We first pre-train this system on hybrid data. Afterwards, the network is fine-tuned with real printed text lines. We demonstrate that this training strategy is efficient for obtaining state-of-theart results. We also show that the score of the proposed systém is comparable or even better in comparison to several state-ofthe-art systems

    Writer Retrieval and Writer Identification in Greek Papyri

    No full text
    The analysis of digitized historical manuscripts is typically addressed by paleographic experts. Writer identification refers to the classification of known writers while writer retrieval seeks to find the writer by means of image similarity in a dataset of images. While automatic writer identification/retrieval methods already provide promising results for many historical document types, papyri data is very challenging due to the fiber structures and severe artifacts. Thus, an important step for an improved writer identification is the preprocessing and feature sampling process. We investigate several methods and show that a good binarization is key to an improved writer identification in papyri writings. We focus mainly on writer retrieval using unsupervised feature methods based on traditional or self-supervised-based methods. It is, however, also comparable to the state of the art supervised deep learning-based method in the case of writer classification/re-identification

    Hluboký zobecněný max pooling

    No full text
    Global pooling layers are an essential part of Convolutional Neural Networks (CNN). Global average pooling or global max pooling are commonly used for converting convolutional features of variable size images to a fix-sized embedding. However, both pooling layer types are computed spatially independent. In contrast, we propose Deep Generalized Max Pooling that balances the contribution of all activations of a spatially coherent region by re-weighting all descriptors so that the impact of frequent and rare ones is equalized. We show that this layer is superior to both average and max pooling on the classification of Latin medieval manuscripts (CLAMM’16, CLAMM’17), as well as writer identification (Historical-WI’17).Globální poolingové vrstvy jsou nezbytnou součástí konvolučních neuronových sítí (CNN). Globální průměrný pooling nebo globální maximální pooling se běžně používají k převodu konvolučních příznaků obrazů s proměnnou velikostí na vektor s pevnou velikostí. Oba typy poolingových vrstev jsou však počítány prostorově nezávisle. Na rozdíl od toho navrhujeme „hluboký zobecněný max pooling“, který upravuje přínos všech aktivací prostorově koherentní oblasti převážením všech deskriptorů tak, aby byl vyrovnáván dopad častých a vzácných. Ukazujeme, že tato vrstva překonává výsledky průměrného i maximálního poolingu v úloze klasifikace latinských středověkých rukopisů (CLAMM’16, CLAMM’17), stejně jako v identifikaci pisatele (Historical-WI’17)

    Differentiable data augmentation with Kornia

    No full text
    Trabajo presentado en el 1st Workshop on Differentiable Vision, Graphics, and Physics Applied to Machine Learning (DiffCVGP), celebrado de forma virtual en diciembre de 2020In this paper we present a review of the Kornia [1, 2] differentiable data augmentation (DDA) module for both for spatial (2D) and volumetric (3D) tensors. This module leverages differentiable computer vision solutions from Kornia, with an aim of integrating data augmentation (DA) pipelines and strategies to existing PyTorch components (e.g. autograd for differentiability, optim for optimization). In addition, we provide a benchmark comparing different DA frameworks and a short review for a number of approaches that make use of Kornia DDA
    corecore